Accusoft.SmartZoneOCR4.Net
Define and Edit Character Sets
See Also Send Feedback
SmartZone OCR 4 for .Net - User Guide > How To > Define and Edit Character Sets

Glossary Item Box

Define Character Sets

SmartZone OCR allows one or more character sets to be defined using the CharacterSet class. SmartZone OCR provides an Add and Remove method to add or remove single or multiple characters in a string and/or pre-defined character sets to your current character set collection creating subsets as needed.

To see if a character is in the current Character Set, use the Contains method.

Language Support for Character Sets

If you have specified a language, it will be used to refine the contents of any character set containing alphabetic entries. For example, specifying a language of Italian and a character set of Alphanumeric would limit the returned results to only letters included in the Italian alphabet plus digits.

You have the choice of multiple language support including the following:

Language Description
English Is: a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 ! " # % & ' ( ) * , - . / : ; ? @ [ \ ] _ { | } $ ¢ £ ¥ € + < = >
French Is the English character set plus: « » À Â Ç È É Ê Ë Î Ï Ô Ù Û Ü à â ç è é ê ë î ï ô ù û ü.
Spanish Is the English character set plus: « » ¡ ¿ Á É Í Ñ Ó Ú Ü á é í ñ ó ú ü
Italian Is the English character set plus: « » À È É Ì Ò Ù à è é ì ò ù
German Is the English character set plus: « » „ Ä Ö Ü ä ö ü ß
Dutch Is the English character set plus: À Á Â Ä Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ö Ù Ú Û Ü à á â ä ç è é ê ë ì í î ï ñ ò ó ö ù ú ü
Portuguese Is the English character set plus: À Á Â Ç È É Ê Í Ò Ó Ô Õ Ú Ü à á â ç è é ê í ò ó ô õ ú ü
Norwegian Is the English character set plus: « » Å Æ Ø å æ ø
Finnish Is the English character set plus: « » Å Ä Ö å ä ö
Danish Is the English character set plus: « » „ Å Æ Ø å æ ø
Swedish Is the English character set plus: « » Å Ä Ö å ä ö
Western European Is all the supported characters in English, French, Spanish, Italian, German, Dutch, Portuguese, Norwegian, Finnish, Danish, and Swedish.

For best recognition accuracy results, set the character set to the narrowest set possible, including all possible returned values, then limit any possible returns by applying pre-defined character sets listed here. Character sets are used to limit (reduce) possible returned values once a universe of possible returned values is defined. For example, since é is not included in the English language, in order to accurately read the word "Résumé", you would need to specify a language that included é, such as French, since that includes all English letters plus é. You could then improve recognition further by omitting any other characters you do not expect to encounter.

Predefined Character Sets

There are 11 additional pre-defined character sets available as properties:

# Predefined Character Set Description
1 AllAlphas Includes all upper and lower case alpha characters.
2 AllCharacters

Includes all upper and lower case alpha, all digits, punctuation, currency and arithmetic characters.

3 AlphaNumeric Includes all upper and lower case alpha and digit characters.
4 Arithmetic Includes all digits, arithmetic and arithmetic punctuation characters 0123456789+<=>%-.*/
5 ArithmeticSymbols Includes all arithmetic characters +<=>
6 Currency Includes all digits, currency and currency punctuation characters 0123456789$¢€£¥,.'-=
7 CurrencySymbols Includes all currency characters $¢€£¥
8 Digits Includes all digits as characters 0123456789
9 LowerCase Includes only lower case alpha characters abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêë
10 PhoneNumber

Includes  0123456789-.+/EXText()

The phone number's extension can be preceded by the individual characters x or X. You can also precede the extension with ext or EXT.
11 Punctuation

Includes only punctuation characters !"#%&'()*,-./:;?@[\]_{|}¡¿«»„

12 UpperCase Includes only upper case alpha characters A B C D E F G H I J K L M N O P Q R S T U V W X Y Z À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü

Edit Character Sets

Optimal recognition results are obtained by using the character set that includes all and only the characters that potentially are encountered.

The ability to modify character sets into subsets to increase accuracy, confidence and speed is available in SmartZone OCR using the following methods:

 

See Also

©2013. Accusoft Corporation. All Rights Reserved.